An embedded word training procedure for connected digit recognition
نویسندگان
چکیده
The "conventional" way of obtaining word reference patterns for connected word recognition systems is to use isolatàd word patterns, and to rely on the dynamics of the matching algorithm to account for the differences in connected speech. Connected word recognition, based on such an approach, tends to become unreliable (high error rates) when the talking rate becomes grossly incommensurate with the rate at which the isolated word training patterns were spoken. To alleviate this problem, an improved training procedure for connected word (digit) recognition is proposed in which word reference patterns from isolated occurrences of the vocabulary words are combined with word reference patterns extracted from within connected word strings to give a robust, reliable word recognizer over all normal speaking rates. In a test of the system (as a speaker trained, connected digit recognizer) with 18 talkers each speaking 40 different strings (of variable length from 2 to 5 digits), median string error rates of 0% and 2.5% were obtained for deliberately spoken strings and naturally spoken strings, respectively, when the string length was known. Using just isolated word training tokens, the comparable error rates were 10% and 11.3% respectively.
منابع مشابه
On the Application of Embedded Digit Training to Speaker Independent Connected Digit Recognition
In recent years, several algorithms have been proposed for recognizing a string of connected words (typically digits) by optimally piecing together reference patterns corresponding to the words in the string. Although the algorithms differ greatly in details of implementation, storage requirements, etc., they all have essentially the same performance in that their ability to match the unknown s...
متن کاملConnected Digit Recognition with Class Specific Word Models
This work focuses on efficient use of the training material by selecting the optimal set of model topologies. We do this by training multiple word models of each word class, based on a subclassification according to a priori knowledge of the training material. We will examine classification criteria with respect to duration of the word, gender of the speaker, position of the word in the utteran...
متن کاملNovel filler acoustic models for connected digit recognition
The context-dependent modeling technique is extended to include non-speech ller segments occurring between speech word units. In addition to the conventional context-dependent word or subword units, the proposed acoustic modeling provides an e cient way of accounting for the effects of the surrounding speech on the inter-word non-speech segments, especially for small vocabulary recognition task...
متن کاملAsynchrony modeling for audio-visual speech recognition
We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In this paper, we consider such models synchronized at the phone boundary level, allowing various de...
متن کاملAutomatic speech recognition in Mandarin for embedded platforms
In this paper, we describe a real-time automatic speech recognition system for Mandarin for low-cost embedded platforms using fixed-point digital signal processors. The hands-free, speaker-independent speech recognition system employs 41 mono-phone models for representing the sounds in Mandarin Chinese and 11 whole-word models for connected digit recognition. The system achieves greater than 98...
متن کامل